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Abstract 

Numerical evaluation of the overlap Dirac operator is difficult since it contains the sign function 
e(hfw) of the Hermitian Wilson-Dirac operator with a negative mass term. The problems are 
due to having very small eigenvalues on the equilibrium background configurations generated 
in current day Monte Carlo simulations. Since these are a consequence of the lattice discretisation 
and do not occur in the continuum version of the operator, we investigate in this paper to what 
extent the numerical evaluation of the overlap can be accelerated by making the Wilson-Dirac 
operator more continuum-like. Specifically, we study the effect of including the clover term in the 
Wilson-Dirac operator and smearing the link variables in the irrelevant terms. In doing so, we have 
obtained a factor of two speedup by moving from the Wilson action to a FLIC (Fat Link Irrelevant 
Clover) action as the overlap kernel. 
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I. INTRODUCTION 


The overlap formalism i B I, i leads, in the vector case, to a lattice formulation of 
QCD based on the overlap Dirac operator 0, given (in the massless case) by 

-Do = ^( 1 + 75e(hfw)) , e(if) = (1) 

(a=lattice spacing) where 


777 

D„ = 75(Dw--) (2) 

a 

is a hermitian operator constructed from the Wilson-Dirac operator 0 with m being 
a tuning parameterd The free field propagator of Do has the correct continuum limit and 
is free of doublers when 0 < m < 2. Because of its origin in the overlap formalism. Do 
has good chiral properties 0; this can also be seen from the fact that it satisfies |§| the 
Ginsparg-Wilson relation [Q: 


75D + D75 = 2aD'y5D. 


( 3 ) 


Lattice Dirac operators satisfying this relation have an exact, lattice-deformed chiral sym¬ 
metry ll^, can have exact zero modes with definite chirality HIT 
renormalisation and other promising theoretical properties 


12 


as well as absence of mass 

10. 

The nice theoretical properties of the overlap Dirac operator come at a price: numerical 
evaluation of it via polynomial approximation is difficult due to the discontinuity at the origin 
of the matrix sign function e{H). Practical methods have been developed in which e{H) is 
approximated by a sum over poles cat (if), using either the so-called polar decomposition or 
the optimal rational polynomial approximation |^, |^, both of which take the form 


N 


e^iH) = h[co + J2 




k=l 


H^ + d, 


( 4 ) 


The two approximations only differ in their choice of coefficients {co,Ck,dk}, and both are 
evaluated (indirectly) using a multi-shift Conjugate Gradient (CG) matrix inverter to 
calculate their action on a vector. This is an iterative procedure where each iteration requires 
one evaluation of the matrix operator acting on a vector (i.e. two evaluations of if), and 
the number of iterations required to reach a given solution precision is proportional to the 
condition number of H, n{H) = |Amax/Amin|, which is the ratio of the largest eigenvalue of 
H to the smallest eigenvalue ||I6| . 

Triangle inequalities lead to an upper bound [0 given by |Amax| < (8 — m)/a for the 
operator in Eq. (H). The lower bound |Amin| can be zero though. The lattice gauge fields 
for which Amin = 0 form a subspace of measure zero in the space of all lattice gauge fields, 
so it is exceedingly unlikely that one would ever encounter them in a numerical simulation. 
However, our practical experience is that |Amax| ^ 8 while |Amin| is often as small as 10“^. 
This results in an unacceptably large value for the condition number k{H). There is a 


^ We are assuming that the Wilson parameter has been set to its canonical value r = 1. 
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way to get around this problem though |^. The typical spectrum of is characterised 


by a handful of isolated low-lying eigenmodes, so one can project these out and deal with 
them explicitly. The condition number for the remaining part of the spectrum is then small 
enough that the approximation in Eq. (|^) becomes feasible. In practical simulations, after 
projecting out the isolated low-lying modes, takes roughly speaking 0(100 — 300) 

iterations to converge for N ^ 14, meaning that overlap fermions with the standard are 
about 0(200 — 600) times more expensive than standard Wilson fermions. 

Obviously it is desirable to improve upon this situation in order to make simulations with 
overlap fermions more feasible. In this paper we investigate ways to do this by modifying 
the operator in the overlap formula in Eq. (Q) so that its spectral properties are im¬ 
proved. The improvements we seek are twofold: (i) An upward shift in the magnitude of the 
low-lying eigenvalues of so as to decrease the condition number, and (ii) a reduction in 
the density of low-lying eigenvalues, so as to make the projection method of Ref. more 
efficient. Furthermore, our aim is to produce an implementation of the overlap formalism 
that will perform efficiently on large-scale parallel computing architectures. On such ar¬ 
chitectures, the cost of internode communication is typically high compared to the cost of 
intranode computation. We therefore demand that our candidate H be no less sparse than 
the Hermitian Wilson-Dirac operator, that is, possess at most nearest neighbour couplings. 


II. FERMION ACTIONS 


The continuum version He = 75 (^— ^) of has the lower bound |Amin| > ^ since = 
+ — (^)^- Hence the near zero values of the lowest eigenvalues of on equilibrium 

backgrounds at currently accessible fd are a result of the lattice discretisation. Our aim 
is to shift the lowest eigenvalues away from zero by making , or more specihcally, the 
Wilson-Dirac operator in , more continuum-like. In the framework of nonperturbative 
improvement of lattice operators (see, e.g., |^), 0 (a) lattice artifacts in are removed by 
adding the clover term of Ref. . A simple heuristic argument for why this should should 


be beneheial in the present situation is the following. We write the Wilson-Dirac operator 


as 


Ow —'^+ I A, (5) 

where the naive lattice Dirac operator ^ and lattice Laplace operator A are given by 

od^x,x' = fj)x,x' — 2 ^ ^ \h fjL^)jL{^'^')dx+eij,,x' e^) 6 x—eij,,x')~\ 5 ( 6 ) 


The 7 matrices are chosen to be hermitian, so ^ is antihermitian and A is hermitian and 
positive. Dehne the operator C by the relation 

+ O. (8) 

(where V^V^). Then C = |[ 7 ^, yj [V^, is ~ 0(a) and coincides with the usual 

clover term (with coefficient Cgw = 1, the tree level value) up to an 0{a‘^) term. Here and in 
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the following Cl(a^) denotes a lattice operator whose leading term in a formal expansion in 
powers of the lattice spacing is ~ a^. Now, setting the parameter m in to its canonical 
value m = 1 we have 

Hi = (fl. - - 1) = - A + (f A)^ + f |A,y| + (9) 

Straightforward calculations show that A + ~ 0{a?) and [A,)/] 0{a ); hence, by Eq. 

(|), we have + A = C + 0{a^). Hence we obtain the lower bound 

Hi > J,-C-Oia^) = l,-Oia). ( 10 ) 

Thus the lower bound ^ on the continuum version of Hi is spoiled in the lattice case by 
an 0{a) term. If we now add C to A in (^, i.e. replace 

Dcw=y+f(A + C') (11) 

we hnd 


Hi = (Zl,„ - i)*(Zl,. - 1) 

= -y2_(A + C') + (f(A + C'))2 + f[A + C',y] + 


> - 0{a ). 


( 12 ) 


Hence the 0{a) term (—C) in Eq. (|T^) has dropped out and the continuum lower bound I 
is now only spoiled by an 0{a‘^) term. 

However, it is well-known that adding a clover term only improves the chiral properties 
of the Wilson-Dirac operator on smooth backgrounds, and that the localisation of the real 
eigenvalues of Hew is actually worse than for on typical gauge backgrounds generated 
in Monte Carlo simulations |^, This suggests that to further improve the chiral 

properties of Hew we should consider smoothing the lattice gauge held. In Ref. |^, DeGrand 
et. al. found that a signiheant improvement in the chiral properties can be achieved by 
applying an APE smearing procedure to the link variables, leading to a fat 

link version of Hew (The idea of using fat links in fermion actions was hrst explored by the 
MIT group 1^.) More recently, Zanotti et.al. have shown in that such improvement can 


be achieved by smearing only the link variables appearing in the irrelevant operators, i.e. in 
the Wilson and clover terms. This has the advantage of preserving the short distance quark 
interactions. (The idea of using fat links in the irrelevant operators had been independently 
suggested previously in Ref. jl^ .) 

Motivated by the preceding discussion, we compare the evaluation of the usual overlap 
Dirac operator with the operators obtained by replacing in the overlap formula of Eq. (|I]) 
with the following variants. (The lattice spacing has been set to 1 unless specihed otherwise): 
(i) Hermitian Wilson-Dirac operator with clover term: 


^ 75(y+ t(A - ■ F) - m). 


(13) 


where 


^ liu - 


F^lx) = ^{C^bx) - Clbx)), 
C^ul) = ^{u^bx) U-ubx) + Uu-bx) U-f.-ub)). 
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(14) 

(15) 



















(ii) Fat link Hermitian Wilson-Dirac operator, both with and without clover term: 


( 16 ) 

(17) 


= 75(y+- m), 

fffc„(m,c.„,an.p.) = 75 (y+ - m). 

where APE-smearing is carried out on the individual links in the irrelevant operators by 
making the replacement 


U,{x) ^ =V{{a- l)U^{x) + Uu{x)U,{x + aeMi^ + 


(18) 


Here V denotes projection of the RHS of Eq. (|T^) back to the SU(3) gauge group. That 
is, each link is modihed by replacing it with a combination of itself and the surrounding 
staples to give a set of “fat links”. The means by which one projects back to SU(3) is 
not unique. We choose an SU(3) matrix ulf‘\x) such that the gauge invariant measure 


ReTr(17i“) {x)Xj,{x)) is maximal, where X^{x) is the smeared link before projection, that is 

U^\x) = VXn{x). As the process of APE-smearing removes short-distance physics, it is 
preferable to only smear the irrelevant operators. Throughout this work “fat” means APE 
smearing of links in irrelevant terms only. Here a is the smearing fraction and Uape is the 
number of smearing sweeps ( ]T^) we perform. As shown in , we can effectively reduce the 
two-dimensional parameter space {a, Uape) to a one-dimensional space that depends soley on 
the product auape, and this is reflected in the notation in Eqs. ([I^ ) - ([l7D . 

Finally, as in [Q, we can perform tadpole or mean-held improvement (MFI) to bring 
our links closer to unity. This consists of updating each link with a division by the mean 
link, which is the fourth root of the average plaquette. 


tio = (5ReTrt/„„(a:))J„<„. 


(19) 


In the case of and mean-held improvement has little ehect, entering in only as a 
single power in both cases. For iJ*, mean held improvement ehectively changes the value of 
m and renormalises the Wilson parameter r. In the case of iffw it has a similar ehect but we 
have two mean link values, one for the untouched set of links and one for the smeared set. 
However, uq enters in as the fourth power in front of the clover term, ehectively raising Cgw 
towards its non-perturbative value. Hence our hnal two variants of are the following, 
(iii) MFI clover Hermitian Wilson-Dirac operator, both with and without fat links: 


(m,Csw) 


■^fcl (^5 Csw; Q^^ape) 




Uq 2 Uq 


e) 


2{4r 


a 




( 20 ) 

( 21 ) 


where we have differentiated the mean link no for the nntonched links and ajj for the fat 
links. We refer to the MFI fat clover action as the FLIC (Fat-Link Irrelevant Clover) action. 
The FLIC action was recently introduced and studied in Ref |^. If followed by a number 
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(e.g. FLIC12) this denotes the number of APE-smearing sweeps (at a = 0.7) used in the 
action. 

Before proceeding to the numerical results it is worth pointing out that the previous 
analytical results on the locality and continuum limit of the axial anomaly |^, |^, |^, 
and index of the overlap Dirac operator continue to hold when is replaced by any 
of the variants given above in the overlap formula. In the case of the axial anomaly and 
index, this is essentially because the leading order term in the expansion of commutators 
of the covariant hnite difference operators in powers of the lattice spacing is unchanged, 
and the variants of all coincide with in the free held case. Regarding locality, the 
admissibility bounds of [|^ ^ on the plaquette variables get modihed somewhat when 
the different variants of are used. In light of the heuristic arguments above and our 
numerical results below, we expect that it should be posible to derive improved, (i.e., less 
restrictive), bounds in these cases, although so far we have not been able to show this. 

We also mention that more general variants of the overlap Dirac operator have been 
considered where one starts with an approximate solution to the Ginsparg-Wilson relation 
and then gets an exact solution by substituting into the overlap formula 


41, 42 


This has led to variants of the overlap action which are both easier to evaluate and more local 
than the original. However, it is not clear whether such general operators will have the good 
topological properties of the standard overlap Dirac operator, namely exact zero-modes with 
dehnite chirality in topologically nontrivial backgrounds, (c.f. the counter-example of Chiu 
P3| , ^). This is important in connection with the lattice implementation of the Witten- 
Veneziano formula for the mass with GW fermions |Q, since for the argument to work 
it is essential that the would-be zero modes are exact zero modes. 


III. SPECTRAL FLOW COMPARISON 

In order to test the merits of each of our proposed actions, we hrst calculate the spectral 
flow of each of them to see if our reasoning regarding their low-lying spectra is valid. From 
the quadratic form of the lower bounds as a function of m, and based upon results given 
in Ref. |^, we expect there to be some peak value of m for which the gap around zero is 
the largest. We calculated the flow of the lowest 15 eigenvalues as a function of m for an 
ensemble of 10 mean-held improved Symanzik conhgurations at /? = 4.38 and size 8^ x 16. 
The following how graphs allow us to see the m value for the biggest gap, and also allow 
us to compare the diherent actions. As we are interested in the magnitude of the low-lying 
values rather than their sign, we plot |A| vs m. 

We begin by examining the how of the Wilson and clover action in Figure |^. We see the 
Wilson spectrum is very poor, with a high density of very small eigenmodes and no gap away 
from zero. The addition of the clover term (at Cg„ = 1) provides some improvement, shifting 
the how upwards and moving the peak values towards m = 1 as expected. The presence of 
many small eigenmodes persists however, although their density is clearly reduced. 

In Figure we examine the MFI clover and fat Wilson actions. Mean held improvement 
assists the basic clover action somewhat, spreading the spectrum upwards, although the 


^ Specifically, if i^approx is some approximate solution to the GW relation then A = I — Ztapprox satisfies 
A* A « 1. An exact solution D to the GW relation, which is approximately equal to L^approx, is then 
obtained via the overlap formula by setting D = 1 - ■ 
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FIG. 1: Spectral flow of the Wilson action (left) and the clover action (right) at /3 = 4.38. 


lowest modes are not raised significantly. The mass value at which the low-lying density is 
minimised has moved significantly away from m = 1.2 to around m = 0.6. As mentioned 
earlier, essentially all MFI does in this case is to change the value of Cs„ to I.O/mq, pushing it 
towards its non-perturbative value. Modifying the Wilson action by smearing the irrelevant 
operators provides a considerable improvement. While there are still some small modes 
present, their density has been greatly reduced, and the spectral flow now has a clear division 
between the isolated low-lying modes and the modes where the spectral density becomes high 
which are well separated from zero. Smearing was performed with a = 0.7 and riape = 12 
smearing sweeps. 



0.4 0.6 0.8 1.0 0.8 1.0 1.2 1.4 


m m 

FIG. 2: Spectral flow of the MFI clover action (left) and the fat Wilson action (right) at /3 = 4.38. 

Results for the fat clover and FLIC12 actions are shown in Figure The spectral flow 
of the fat clover action clearly demonstrates the superiority of clover-improved actions. The 
gap around zero is enhanced again over the fat Wilson action, and the number of isolated 
low-lying modes is signihcantly reduced. As the fat links are already close to unity, the 
addition of mean held improvement only affects the fat clover how slightly, raising the gap 
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FIG. 3: Spectral flow of the fat clover action (left) and FLIC12 action (right) at /3 = 4.38. 


around zero a little and spreading the eigenvalues upwards slightly also. The low-lying 
density is again very good in this case and far superior to that of the Wilson action. 

To conhrm our results we choose the Wilson action as a “baseline” and compare it against 
the FLIC action (the best of the alternative actions) on a larger, hner lattice, 12^ x 24 at 
(3 = 4.60. This time we only use 4 smearing sweeps in the FLIC action since FL1C4 has less 
fattening and is the choice used in actual simulations||30[|. We see that the Wilson action 
benehts signihcantly from the smaller lattice spacing, as there is now a visible separation 
from zero before the modes become dense. The FLIC action has the same characteristics 
as on the coarser lattice, but it now has a peak separation of the dense modes from zero of 
around 0.45! 
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FIG. 4: Spectral flow of the Wilson action (left) and FLIG4 action (right) at /3 = 4.60. 


Additionally, we tested the dependence of the FLIC action upon the amount of smearing 
done. As stated in [^, we only effectively need to vary the product cmape, so we £x a at 0.7 


and vary riape between 0 and 12. We observe that the initial 4-6 sweeps have a signihcant 


































































FIG. 5; Dependence of the FLIC spectrum at /3 = 4.60, m = 1.35 (left) and /3 = 4.38, m = 1.45 
(right) on the number of APE smearing sweeps. 


effect, but past 6 sweeps the effect is marginal, with the low lying density remaining roughly 
constant and the eigenvalues being compressed very slightly downwards. 


IV. RESULTS 

Having obtained some understanding of the low-lying spectra of the various actions via the 
flow diagrams, we now turn to quantitative comparisons. Firstly we examine the condition 
number, k, of the different actions as a function of m. We show below the condition numbers 
having projected out the lowest 5 eigenmodes and the lowest 15 eigenmodes on the 2 lattices 
that we used. The points are the mean condition numbers across the ensembles, and the 
error bars indicate the minimum and maximum condition numbers, giving an idea of the 
variation in k,. The smeared irrelevant-term actions here used 12 APE sweeps at a = 0.7 for 
the coarse lattice and 4 sweeps for the hne lattice. Some points are offset horizontally for 
clarity. 

Two things are immediately noticeable. Firstly, the smeared irrelevant-term actions 
are much better conditioned than the unsmeared actions, and secondly, the variation of 
K between conhgurations is less. It should be noted that the variation (error bars) are 
displayed for all actions, but are smaller than the plot symbol at some points of the fat 
clover and FLIC lines. Projecting out an additional 10 eigenvalues has a signihcant effect on 
the unsmeared actions, but relatively little effect on the smeared actions due to reduction 
in the number of isolated low-lying values. In terms of condition number, the fat clover and 
FLIC actions are clearly and significantly superior to the other actions, with the FLIC action 
possessing a (slight) edge over the fat clover which arises from the mean held improvement. 

As the clover term is quite fast to evaluate, we discard the fat Wilson as a candidate 
action at this point as it is the least well-conditioned of the smeared actions. Given the 
similarity between the clover-improved actions with and without mean-held improvement, 
we focus on the MFI clover and FLIC actions. We now compare in detail the performance 
for three actions: the Wilson, MFI clover and FLIC. To see how improving the condition 
number translates into a saving in CG iterations, we calculated the number of Multi-CG 
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FIG. 6: Condition numbers of the various actions. (Top-left) Results for f5 = 4.38 with 5 projected 
modes. (Bottom-left) Results for (5 = 4.38 with 15 projected modes. (Top-right) Results for 
(3 = 4.60 with 5 projected modes. (Bottom-right) Results for j3 = 4.60 with 15 projected modes. 


iterations required to evaluate Do once across the ensemble for each of these actions, using 
some typical simulation parameters. 

The Wilson and MFI clover are tested using the 14th order optimal rational polynomial 
(ORP) approximation |^. The improved condition number of the FLIC actions allows us 
to use the 12 th order polar decomposition, chosen to give a maximum deviation from e{x) 
of less than 10“® compared to the 3.1 x 10“® of the 14th order ORP. The order polar 
decomposition is specihed by 


co = 0, Ck = — - ^-^-7 -r-r, 4 = tan2(^(fc-i)). (22) 

iVcos2(^(fc- 2 )) 2N 2 

Low-lying modes are projected out where necessary. The sign function solution is calculated 
to a precision of 10“® across the hne ensemble and the coarse ensemble used above. The 
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Action j3 Projections Mean Min Max 


Wilson 

4.38 

15 

219 

188 

253 


4.60 

15 

202 

190 

212 

MFI clover 4.38 

15 

200 

178 

240 

FLIC12 

4.38 

10 

92 

89 

100 

FLIC6 

4.38 

10 

90 

86 

101 

FLIC4 

4.60 

15 

109 

106 

112 


TABLE I: Conjugate Gradient (CG) Iterations needed for a single evaluation of e^ix) using actual 
simulation parameters. 


value of m is chosen differently for each of the actions to optimise k. Given the relative lack 
of improvement in using the MFI clover action compared to the Wilson, we discard it at 
this point and concentrate on comparing the Wilson and FLIC actions. As the results in 
Table | show, the FLIC action is by far the best in terms of convergence with a reduction 
in iterations compared to the Wilson action of a factor of between 1.9 and 2.4. 

However, what is not clear from this is how the saving in iterations translates into the 
most important quantity, a saving in compute time. Shifting from a standard Wilson action 
to a partially smeared action means that we now have two sets of gauge helds, the standard 
and smeared links. This doubles the number of vector-multiplications needed, and the 
standard spin-projection trick^^ is no longer applicable, providing an additional factor of 
2 in both the multiplications needed and the communications needed. So, moving from 
the Wilson action to the FLIC action costs us a factor of 4 in vector multiplications and a 
factor of 2 in communications, plus the overhead for the clover term. On the other hand, 
evaluating the action of e^i^x) on a vector costs 0{2N) vector multiplications in addition to 
the two evaluations of the kernel, H. While vector multiplications form a signihcant part 
of the cost of evaluating if, they are not the only part. There is a relatively high cost of 
communication compared to computation on the parallel architectures that we wish to use. 
It quickly becomes clear that the only real way to see how much of an improvement we have 
made is to do an actual calculation and compare the compute time needed. 

To test the actual speedup, we choose to calculate the low-lying eigenmodes of = DlD^ 
for the two different kernels, Wilson and FLIC. This calculation allows us to verify that both 
kernels give the appropriate spectral properties]^, and also allows us to calculate directly 
the relative compute time needed to evaluate Do in each case. For the Wilson action we 
used the 14th order Rational Polynomial Approximation in the region which it is bounded 
by unity (0.025 < |a;| < 1.918) and where the maximal deviation from e{x) is 3.1 x 10“^. We 
used the mass parameter m = 1.65 and projected out 15 eigenvalues. For the FLIC action, 
we can take advantage of the improved condition number without reducing the accuracy 
of our approximation by using the polar decomposition at 12 th order, which is sufficient 
to provide a maximal deviation of less than 3.1 x 10“^. This saves us a (small) amount of 
computation. To optimise the condition number we choose to perform only 6 APE sweeps 
with the mass parameter set to m = 1.45 and projecting out 10 eigenvalues. To minimise 
the computation needed, we implement individual pole convergence testing in our Multi- 
CG routine. The hrst pole is considered converged in the iteration according to the 
usual criterion based on the residue, ||r„|| < 6, where we chose 6 = 10“®. The convergence 
criterion for the other poles is easily deduced by noting the shifted polynomial structure of 
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the residual, = Pn^H"^ + o'{i))rQ = (^Pn{H^)ro = Cn'^n- Then the pole is considered 
converged if 


r„||C«<0.1xh, 


(23) 


where is dehned as in Eq. (2.44) of Ref. |^. We have tested this convergence criterion 


by calculating individual residues and found it to be numerically very safe, and also to 
save signihcant amounts of computation. We consider the ten 8 ^ x 16, /3 = 4.38 lattices. 
Computations are performed on 4 nodes of the Orion supercomputer, (a Sun E420R cluster 
comprised of 40 nodes, with each node posessing 4 GB of RAM, 16 MB of L2-cache, and 4 
UltraSPARC II 450 MHz processors and with nodes are connected by Myrinet networking. 
The lowest 6 eigenmodes of are calculated on each conhguration using the Ritz functional 
method E3- We measure the compute time spent in each of the different parts of the “inner- 
CG” calculation, with the following results. 


Code portion Wilson FLIC6 

1 Kernel-vector evaluation (H) 0.022 sec 0.037 sec 

1 Multi-CG iteration (including H) 0.133 sec 0.154 sec 
1 Multi-CG iteration (excluding H) 0.089 sec 0.079 sec 
1 overlap-vector evaluation 25.52 sec 13.67 sec 


TABLE II: Actual compute time spent in the various parts of the algorithm. 


The results show that using the FLIC action as the kernel in the overlap formalism 
provides a saving of a factor of 1.9 in actual compute-time spent in evaluating the overlap 
action. This is easily understood by hrst observing that the time spent in the fermion 
matrix multiplication constitutes less than half of the compute time spent in the inner CG 
inversion. Secondly, we have only paid a factor of 2 in compute time moving from the Wilson 
action to the FLIC action, not the potential factor of 4. This is because the time spent in 
communication and performing the 7 matrix algebra is not negligible when compared to the 
time spent in performing the gauge held multiplications. Finally, as the improved condition 
number of the FLIC kernel allows us to use the 12th order polar decomposition, we expend 
less effort per iteration in the CG component of the sign function evaluation. This is because 


the number of unconverged poles per iteration is reduced, as demonstrated in Table III. 


Pole Wilson ELIC6 

Pole Wilson ELIC6 

1 

1881^? 

851^^ 

8 

55l| 

19lJ 

2 

1881^1 

82+^° 

9 

39l^ 

utl 

3 

1881^? 

65« 

10 

281' 


4 

18813J 

501^ 

11 

19^1 

7^+0 
'-1 

5 

1611}^ 

3913 

12 

14l? 

0 0 
+ 1 

6 


3ll2 

13 

91? 

- 

7 

8o;5 

241^ 

14 

C + l 
^-0 

- 


TABLE III: Breakdown of the mean convergence for each of the poles. 
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These facts mean that the overall compute time per inner CG iteration increases by only 
15% when moving to the FLIC kernel, and hence the saving of 55% in the total number of 
inner CG iterations needed translates into a saving in compute time. Thus we have shown 
that the FLIC action is numerically superior to the Wilson action as an overlap kernel. What 
has not been answered is what, if any, are the differences in physical properties of using 
the different kernels. For example, overlap fermions are free of 0{a) errors irrespective of 
the choice of kernel, but in general may have different 0{a^) errors. This will be addressed 
in future work. 


V. CONCLUSION 

Practical implementations of the overlap-Dirac operator use a sum over poles to ap¬ 
proximate the matrix sign function. These approximations are evaluated using an iterative 
conjugate gradient routine. As each iteration requires about twice as much computational 
effort to evaluate as a single evaluation of H^, reducing the number of iterations needed 
is the most direct way of reducing the expense of the overlap formalism. To succeed in 
this, we select an overlap kernel with an improved condition number motivated by analytic 
arguments. From the six candidate actions tested, the FLIC action has the best convergence 
properties, requiring less low-lying projections than the Wilson action and providing a saving 
in iterations by about a factor of 2. This saving in iterations translates almost directly into 
a saving in computation time. We restate that only the irrelevant operators are smeared, 
and that minimal smearing is required, 6 sweeps at a = 0.7 for (3 = 4.38, a = 0.165(2) or 4 
sweeps at a = 0.7 for (3 = 4.60, a = 0.125(2). As the FLIC action has only nearest neighbour 
couplings, it is well suited to calculations on highly parallel machines. We recognise that 
there will be some implementation dependence in our compute-time results, but believe that 
this dependence will be sufficently small that all groups who wish to perform overlap calcu¬ 
lations will benefit in moving from the Wilson to the FLIC kernel. As we have concluded 
that the FLIC action is a numerically superior kernel, we can proceed to investigate the 
dependence of the overlap action’s physical properties on the kernel action. 
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